Goto

Collaborating Authors

 restaurant process


Spatial distance dependent Chinese restaurant processes for image segmentation

Neural Information Processing Systems

The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The dd-CRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the dd-CRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing "human-like" segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.


Comparing Nonparametric Bayesian Tree Priors for Clonal Reconstruction of Tumors

Deshwar, Amit G., Vembu, Shankar, Morris, Quaid

arXiv.org Machine Learning

Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor.


Transfer Topic Modeling with Ease and Scalability

Kang, Jeon-Hyung, Ma, Jun, Liu, Yan

arXiv.org Machine Learning

The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.


Spatial distance dependent Chinese restaurant processes for image segmentation

Ghosh, Soumya, Ungureanu, Andrei B., Sudderth, Erik B., Blei, David M.

Neural Information Processing Systems

The distance dependent Chinese restaurant process (ddCRP) was recently introduced toaccommodate random partitions of non-exchangeable data [1]. The dd-CRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the dd-CRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension bettersuited for producing "humanlike" segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentationdomain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.


Hierarchical Topic Models and the Nested Chinese Restaurant Process

Griffiths, Thomas L., Jordan, Michael I., Tenenbaum, Joshua B., Blei, David M.

Neural Information Processing Systems

We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting--which of the large collection of possible trees to use? We take a Bayesian approach, generating anappropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarilylarge branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts.


Hierarchical Topic Models and the Nested Chinese Restaurant Process

Griffiths, Thomas L., Jordan, Michael I., Tenenbaum, Joshua B., Blei, David M.

Neural Information Processing Systems

We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting--which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts.


Hierarchical Topic Models and the Nested Chinese Restaurant Process

Griffiths, Thomas L., Jordan, Michael I., Tenenbaum, Joshua B., Blei, David M.

Neural Information Processing Systems

We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting--which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts.